Markdown is one of the world’s most popular markup languages used in data science. Both R Markdown and Jupyter Notebooks use Markdown to provide an unified authoring framework for data science, combining code (R, Python, SQL,…), its results and commentary. The documents are fully reproducible and support dozens of output formats, like PDFs, Word files, slideshows, dashboards and more.
R Markdown Notebook
However, using Markdown doesn’t mean that you can’t also use Hypertext Markup Language (HTML). You can add HTML tags to any Markdown file.
According to Wickham & Grolemund (2016), Markdown files are designed to be used in three ways:
For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.
For collaborating with other data scientists, who are interested in both your conclusions, and how you reached them (i.e. the code).
As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.
Learn the most important basics of Markdown in this excellent interactive “60 Seconds Markdown Tutorial”. There are many options to discover - for example, this link will bring you back to the top of the page. To get an overview about the various output formats of RMarkdown-documents, watch this short video from RStudio: “What is RMarkdown?”.
Furthermore, you can add your own Cascading Style Sheets (CSS) to change the style of your HTML document (CSS describes how HTML elements should be displayed) by using the css option in YAML (see section YAML).
R Markdown documents can include one or more global parameters whose values can be set when you render the report. For example, the code below uses a country and year parameter that determines which country to filter.
africa_07 <-
gapminder %>%
filter(continent == params$country) %>% # use parameter
group_by(continent, year) %>%
summarize(mean = round(mean(lifeExp),2)) %>%
filter(year == params$year) %>% # use parameter
pull(mean) # obtain result as number
In R Markdown it is also easy to integrate the results of R code in text elements. In particular, we can perform a data analysis like the one above and integrate the corresponding result (stored in africa_07) in Markdown comments. Instead of actually typing the result, we use the code `r africa_07` (read this post to learn how to display R code snippets in Markdown).
This code:
`r africa_07` years in 2007renders to:
You can organize content using tabs by applying the {.tabset} class attribute to headers within a document. This will cause all sub-headers of the header with the .tabset attribute to appear within tabs rather than as standalone sections (learn more about the usage of tabs):
To create HTML documents from R Markdown, you first need to specify the html_document output format in the YAML metadata at the top of your document.
YAML (a recursive acronym for “YAML Ain’t Markup Language”) is a human-readable data-serialization language which is commonly used for configuration files and in applications where data is being stored or transmitted.
You can find an overview of all the YAML-options for R Markdown in the excellent book “R Markdown: The Definitive Guide” (2019) from Yihui Xie, J. J. Allaire and Garrett Grolemund.
YAML metadata for this document:
---
title: "Write Reports in R Markdown"
author: "Prof. Dr. Jan Kirenz, `r params$institute`"
params:
institute: "HdM Stuttgart"
country: "Africa"
year: "2007"
output:
html_document:
css: style.css # define your own css
df_print: paged # tables are printed as HTML tables
highlight: default # syntax highlighting style
number_sections: yes # numbering of sections
theme: paper # style option
fig_height: 4 # figure height
fig_width: 8 # figure width
toc: yes # table of content
toc_float:
collapsed: false # show full toc
smooth_scroll: true # toc scrolling behavior
includes:
after_body: footer.html # include footer
---
If you want to use data or packages in multiple code chunks, it is good practice to load them once in a code chunk called setup right below the YAML-options. Furthermore, if a certain option needs to be frequently set to a value in multiple code chunks, you can consider setting it globally in the setup code chunk. To set global options that apply to every chunk in your file, call knitr::opts_chunk$set in a code chunk. Knitr will treat each option that you pass to knitr::opts_chunk$set as a global default that can be overwritten in individual chunk headers.
R setup code chunk of this document:
{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE)
library(tidyverse)
library(gapminder)
library(plotly)
Chunk output can be customized with knitr options, arguments set in the {} of a chunk header:
include = FALSE prevents code and results from appearing in the finished file. However, R Markdown still runs the code in the chunk, and the results can be used by other chunks.
echo = FALSE prevents code, but not the results from appearing in the finished file.
eval = FALSE prevents code from running and only displays the code in a knitted document.
message = FALSE prevents messages that are generated by code from appearing in the finished file.
warning = FALSE prevents warnings that are generated by code from appearing in the finished file.
Notice that we already are able to create enhanced HTML tables via our df_print option in the YAML options:
(life_exp_07 <- gapminder %>%
filter(year==2007) %>%
arrange(desc(lifeExp)))
You can also use the package kableExtra to build HTML optimized tables and manipulate table styles. It imports the pipe %>% symbol and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2 and plotly:
library(kableExtra)
kable(head(life_exp_07, 6)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Japan | Asia | 2007 | 82.603 | 127467972 | 31656.07 |
| Hong Kong, China | Asia | 2007 | 82.208 | 6980412 | 39724.98 |
| Iceland | Europe | 2007 | 81.757 | 301931 | 36180.79 |
| Switzerland | Europe | 2007 | 81.701 | 7554661 | 37506.42 |
| Australia | Oceania | 2007 | 81.235 | 20434176 | 34435.37 |
| Spain | Europe | 2007 | 80.941 | 40448191 | 28821.06 |
The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages and DataTables provides filtering, pagination, sorting, and many other features in the tables. See this DT-Table documentation for an overview of the different options.
library(DT)
datatable(head(life_exp_07, 6),
rownames = FALSE,
filter = "top",
colnames = c('Country',
'Continent',
'Year',
'Life Expectancy',
'Population',
'GDP per Capita'),
caption = 'Table 1: Gapminder data overview')
The gt package is all about making it simple to produce nice-looking display tables:
library(gt)
set.seed(123)
life_exp_07 %>%
slice_sample(n=10) %>%
gt()
| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Singapore | Asia | 2007 | 79.972 | 4553009 | 47143.1796 |
| Malaysia | Asia | 2007 | 74.241 | 24821286 | 12451.6558 |
| Burkina Faso | Africa | 2007 | 52.295 | 14326203 | 1217.0330 |
| Panama | Americas | 2007 | 75.537 | 3242173 | 9809.1856 |
| Swaziland | Africa | 2007 | 39.613 | 1133066 | 4513.4806 |
| Zambia | Africa | 2007 | 42.384 | 11746035 | 1271.2116 |
| Comoros | Africa | 2007 | 65.152 | 710960 | 986.1479 |
| India | Asia | 2007 | 64.698 | 1110396331 | 2452.2104 |
| Afghanistan | Asia | 2007 | 43.828 | 31889923 | 974.5803 |
| Mauritania | Africa | 2007 | 64.164 | 3270065 | 1803.1515 |
Same table with some adjustments:
set.seed(123)
life_exp_07 %>%
slice_sample(n=10) %>%
group_by(continent) %>%
gt() %>%
tab_header(
title = "Gapminder data overview",
subtitle = "Data overview with the gt package"
) %>%
tab_source_note(
source_note = "Source: Gapminder"
) %>%
fmt_currency(
columns = vars(gdpPercap),
currency = "USD",
decimals = 0
) %>%
fmt_number(
columns = vars(lifeExp),
decimals = 2
) %>%
fmt_number(
columns = vars(pop),
decimals = 2)
| Gapminder data overview | ||||
|---|---|---|---|---|
| Data overview with the gt package | ||||
| country | year | lifeExp | pop | gdpPercap |
| Asia | ||||
| Singapore | 2007 | 79.97 | 4,553,009.00 | $47,143 |
| Malaysia | 2007 | 74.24 | 24,821,286.00 | $12,452 |
| India | 2007 | 64.70 | 1,110,396,331.00 | $2,452 |
| Afghanistan | 2007 | 43.83 | 31,889,923.00 | $975 |
| Africa | ||||
| Burkina Faso | 2007 | 52.30 | 14,326,203.00 | $1,217 |
| Swaziland | 2007 | 39.61 | 1,133,066.00 | $4,513 |
| Zambia | 2007 | 42.38 | 11,746,035.00 | $1,271 |
| Comoros | 2007 | 65.15 | 710,960.00 | $986 |
| Mauritania | 2007 | 64.16 | 3,270,065.00 | $1,803 |
| Americas | ||||
| Panama | 2007 | 75.54 | 3,242,173.00 | $9,809 |
| Source: Gapminder | ||||
ggplot2 is a system for declaratively creating graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details (ggplot2 documentation)
## data preparation
gap_continent <- gapminder %>%
group_by(continent, year) %>%
summarize(mean = round(mean(lifeExp),2))
## create plot
p <- ggplot(gap_continent, aes(year, mean, color = continent)) +
geom_line() +
theme_classic() +
ggtitle("Average Life Expectancy") +
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.title=element_blank())
## display plot
p
Plotly’s R graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D (WebGL based) charts.
library(plotly)
ggplotly(p)
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB. For detailed information, visit the package documentation.
Once installed, you can use this package at the R console, within R Markdown documents, and within Shiny applications. If you like to implement leaflet in shiny, review this tutorial
leaflet() %>%
setView(lng = 9.102360, lat = 48.740760, zoom = 17) %>%
addTiles()
content <- paste(
sep = "<br/>",
"<b><a href='https://www.hdm-stuttgart.de'>HdM Stuttgart</a></b>",
"Nobelstraße 8",
"70569 Stuttgart"
)
leaflet() %>%
setView(lng = 9.102360, lat = 48.740760, zoom = 17) %>%
addTiles() %>%
addPopups(9.101470, 48.741460, content,
options = popupOptions(closeButton = FALSE)
)
ggmap is an R package that makes it easy to retrieve raster map tiles from popular online mapping services like Google Maps and Stamen Maps and plot them using the ggplot2 framework:
library("ggmap")
# data
us <- c(left = -125, bottom = 25.75, right = -67, top = 49)
get_stamenmap(us, zoom = 5, maptype = "toner-lite") %>%
ggmap()
library(purrr)
# define function
`%not_in%` <- purrr::negate(`%in%`)
# prepare data
violent_crimes <- crime %>%
filter(
offense %not_in% c("auto theft", "theft", "burglary")
-95.39681 <= lon & lon <= -95.34188,
29.73631 <= lat & lat <= 29.78400
) %>%
mutate(
offense = fct_drop(offense),
offense = fct_relevel(offense, c("robbery", "aggravated assault", "rape", "murder"))
)
Plot data:
qmplot(lon, lat, data = violent_crimes,
maptype = "toner-background",
color = offense) +
facet_wrap(~ offense)
Alternative plot:
qmplot(lon, lat, data = violent_crimes, geom = "blank",
zoom = 14, maptype = "toner-background", darken = .7, legend = "topleft"
) +
stat_density_2d(aes(fill = ..level..), geom = "polygon", alpha = .3, color = NA) +
scale_fill_gradient2("Robbery\nPropensity", low = "white", mid = "yellow", high = "red", midpoint = 650)
© Jan Kirenz | Made with R Markdown
HdM Stuttgart